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ABSTRACT 

We studied patterns of authorship in computer sci¬ 
ence (CS) research in the Philippines by using data 
mining and graph theory techniques on archives of 
scientific papers presented in the Philippine Com¬ 
puter Science Congresses from 2000 to 2010 involving 
326 papers written by 605 authors. We inferred from 
these archives various graphs namely, a paper-author 
bipartite graph, a co-authorship graph, and two mixing 
graphs. Our results show that the scientific articles by 
Filipino computer scientists were generated at a rate of 
33 papers per year, while the papers were written by an 
average of 2.64 authors (maximum=13). The frequency 
distribution of the number of authors per paper follows 
a power-law with a power of ip = —2.04 (i? 1 2 = 0.71). 
The number of Filipino CS researchers increases at 
an annual rate of 60 new scientists. The researchers 
have written an average of 1.42 papers (maximum=20) 
and have collaborated with 3.70 other computer sci¬ 
entists (maximum=54). The frequency distribution of 
the number of papers per author follows a power law 
with i p = —1.88 (R 2 = 0.83). This distribution closely 
agrees with Lotka’s law of scientific productivity having 
~ —2. The number of co-authors per author also 
follows a power-law with ip = —1.65 (R 2 = 0.80). These 
results suggest that most CS papers in the country were 
written by scientists who prefer to work alone or at most 
in small groups. These also suggest that few papers 
were written by scientists who were involved in large 
collaboration efforts. The productivity of the Philip¬ 
pines’ CS researchers, as measured by their number 
of papers, is positively correlated with their participa¬ 
tion in collaborative research efforts, as measured by 
their number of co-authors (Pearson r = 0.7425). The 
Filipino CS scientists follow a low dissortative mixing 
when choosing a collaborator either in terms of the 
collaborator’s number of papers (r = —0.1015), or its 

*http://www.ics.uplb.edu.ph/jppabico 


number of co-authors (r = —0.0398). This means that 
a Filipino CS researcher with high numbers of papers 
and co-authors chooses a collaborator whose numbers 
of papers and co-authors are low. 
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1. INTRODUCTION 

The patterns of authorship of scientific research arti¬ 
cles reflect how the volume of knowledge was generated 
by the scientists in a country. The number of quality 
papers that a nation’s researchers write within a time 
period reflects the scientific productivity of that nation’s 
scientists. The number of authors who wrote a par¬ 
ticular research article, on the other hand, mirrors the 
number of manpower needed to generate the knowledge 
embodied in the paper. The number of co-authors that 
a scientist has tells the participation of that scientist in 
collaborative research efforts, as well as that scientist’s 
dependency with other researchers to generate knowl¬ 
edge. This paper presents the authorship patterns of 
computer science (CS) research in the Philippines as 
induced from the archives of scientific papers presented 
in the Philippine Computer Science Congresses (PCSC) 
from 2000 to 20100 Although the subject of this paper 
falls under the CS subdisciplines of graph theory, data 
structures, information retrieval and mining, visualiza¬ 
tion, and pattern discovery, the subject matter will be 
of more interest to the whole computing science commu¬ 
nity in the Philippines for just one reason: it is all about 
the Filipino computer scientists. We hope that with this 
paper, we can understand several factors in CS research 
that are unique in the Philippine setting. For example, 
we can quantify the bounds of the amount of scientific 
knowledge that the Filipino computer scientists gener¬ 
ated, as well as the bounds of the number of Filipino 
computing scientists who conducted research in the past 
years. We can also identify who are the most pro¬ 
lific computer scientists, as well as those who are with 
the most number of research collaborators. In general, 
understanding the patterns on how the Filipino com¬ 
puter scientists generate knowledge may provide dis¬ 
cernment on information breakdowns, bottlenecks, and 

1 The PCSC started in 2000 but the 2001 papers are not 

accessible to the author. There was no PCSC conducted 
in 2002 P]. 




structural holes in the scientific community of CS in the 
Philippines. 

In recent years, the advent of advanced computer-based 
archiving technologies made most scientific works in 
the last 10 to 50 years easily accessible via any digital 
media by virtually anyone from anywhere. Examples 
of such archives are the Los Alamos e-Print Archive 
(LAePA) [H], the Medline Database (Medline) [37| . 
the Standford Public Information Retrieval System 
(SPIRES) [13, the Network of Computer Science 
Technical Reference (NCSTRL) flTll . the DBLP Com¬ 
puter Science Bibliography (DBLP) 14|], the Samahang 
Pisika ng Pilipinas ( SPP) [34j], the Transactions of the 
National Academy of Science and Technology-Annual 
Scientific Meetings (NAST-ASM), and the Proceedings 
of the Philippine Society of Agricultural and Biosys¬ 
tems Engineering (BAE) (Table [T]). These 

archives compile scientific papers that were recently 
used by some researchers [3 8, Qjl, IlsL [22l - f2~il . [38 ] who 
conducted data mining techniques to understand the 
complex nature in scientific research in various fields. 
Inferred from these archives are results that show that 
the average papers per author ranges from 2 to 7, while 
the papers were written by an average of 2 to 9 authors. 
Depending on the scientific discipline, a given author 
has an average of 3 to 173 collaborators f7l. flsl. . 


Table 1: List of example scientific paper archives 
as used by various researchers @,i, El m HH- 
I2dl . l38| ■ The number of papers and of authors are 
the numbers when the studies were conducted 
by the respective authors. 


Archive 

Name 

Year 

Started 

Number of 
Papers 

Number of 
Authors 

International Archives 



LAePA 

1992 

> 161,000 

> 94, 000 

Medline 

1961 

> 216,000 

> 152,000 

SPIRES 

1990 

« 66, 000 

> 56, 000 

NCSTRL 

1974 

« 13,000 

« 12,000 

DBLP 

1960 

« 84, 000 

« 95,000 

Philippine-based Archives 


SPP 

2001 

699 

840 

Agriculture 

2006 

235 

645 

NAST-ASM 

2006 

720 

1,780 

ABE 

2007 

90 

171 


In the Philippines, we have previously utilized the 
archives of scientific posters presented at the recent 
NAST-ASM in an initial attempt to understand the 
authorship patterns of Filipino agricultural scien¬ 
tists [22]. Although Pliilippine-based scientific journals 
and proceedings in agricultural science abound, we 
assumed that the papers compiled in the NAST-ASM 
archives represent the majority of scientific knowledge 
discovered by Filipino agricultural scientists, not only 
because of the sheer volume of knowledge it contains, 
but also because of the quality of knowledge presented 
having been reviewed, and often times authored, by no 
less than the nation’s Academicians and National Sci¬ 
entists. The papers used in this study were categorized 
under the Agricultural Sciences Division (ASD) and 


involved 235 poster abstracts written by 645 authors 
spanning the recent four years from 2006 to 2009. In 
this study, we have found that the Filipino agricul¬ 
tural scientists have written an average of 1.39 papers 
(maximum=13), while they have collaborated with 
an average of 2.70 scientists (maximum=28). Their 
research papers were written by an average of 3.81 
Filipino authors (maximum=15). 

Using the same NAST-ASM archives, we recently 
expanded [24| the above study to involve all six NAST 
scientific divisions encompassing various scientific dis¬ 
ciplines: The ASD; The biological sciences (BSD); 
The chemical, mathematical and physical sciences 
(CMPSD); The engineering sciences and technology 
(ESTD); The health sciences (HSD); And the social sci¬ 
ences (SSD). This expanded study involved 720 papers 
written by 1,780 authors. Because of the sheer volume 
of scientific discoveries contained in the archive, we 
assumed that the papers represent the major scientific 
work of Filipino scientists in various disciplines in the 
past four years from 2006 to 2009. Again, our previous 
assumption holds that the archives not only contain 
high quantity of scientific discoveries in the Philippines, 
but more importantly, high quality research results 
for the same reason as mentioned above. The results 
of our analysis show that the Filipino scientists have 
written an average of 1.52 papers (maximum=40), 
while they have collaborated with an average of 2.82 
scientists (maximum=66). The scientific papers have 
been written by an average of 3.70 Filipino authors 
(maximum=22). 

Using the NAST-ASM archives to infer the authorship 
patterns of scientists from specific disciplines proved to 
be difficult to do, even though works of scientists in a 
specific field might already be included in the archives. 
Examples of such disciplines are the Physics, the Agri¬ 
cultural and Biosystems Engineering (ABE), and the 
CS disciplines. The reason for this is that the NAST- 
ASM archives did not label both the scientists and the 
research works as belonging to either the Physics, the 
ABE or the CS discipline. In fact, Physics papers are 
classified only under CMPSD, while ABE papers may be 
classified within two out of six NAST divisions namely, 
ASD and ESTD. Both ASD and ESTD involved papers 
from various other fields that are not ABE in nature, 
such as entomolgy, biochemistry, forestry, information 
technology and all other engineering fields. Researches 
from the CS discipline, on the other hand, maybe clas¬ 
sified under CMPSD and ESTD, which also involved 
various other fields that are not CS in nature. Thus, 
to analyze the authorship patterns of Filipino scientists 
and researchers in specific disciplines, separate archives 
must be used to better reflect the works and workers in 
the said discipline. In the case of Physics and ABE, their 
respective archives actually exist as the Proceedings of 
the Samahang Pisika ng Pilipinas f34j, and the Proceed¬ 
ings of the Joint International Agricultural Engineering 
Conference and Exhibition of the Philippine Society of 
Agricultural Engineers (PSAE) [25l - [27j ]. Both proceed¬ 
ings are archived in digital format. We have analyzed 
the authorship patterns of Filipino physicists from 2001 
to 2005 involving 699 papers written by 840 authors [38|, 









as well as those of the ABE scientists over the recent 
3-year period from 2007 to 2009 involving 90 papers 
written by 171 authors [|j|. Our results in these studies 
are summarized in Table [2] together with the summary 
of the previous works discussed above for comparison 
purposes. 

In this current effort, to understand the authorship 
patterns of Filipino CS researchers, we have applied 
data mining and graph theory techniques on archives 
of papers presented in the PCSC @-0, @, l2fi!433 ] from 
2000 to 2010. The 9-year archive has accumulated 
326 papers written by 605 authors. We have found 
out, on the average, that the CS research papers were 
authored by 2.64 Filipino scientists, while the CS 
researchers themselves have written 1.42 papers and 
have collaborated with 3.70 other scientists. Aside from 
computing these fundamental quantities to compare 
the CS community with other scientific disciplines in 
the country, we also computed the respective frequency 
distributions of these quantities. The power law nature 
of these distributions suggest that most CS papers were 
authored by those who have a few collaborators, while 
few of the papers were authored by those who have a 
large list of collaborators. We have found a statistical 
evidence suggesting that the productivity of computer 
scientists in the country is positively correlated with 
the scientists’ participation in a number of collabora¬ 
tive research endeavors. We have also observed low 
assortative mixing among authors when choosing a 
collaborators in terms of the collaborator’s scientific 
productivity, as well as the collaborator’s number of 
collaborators. We hope that the results contributed 
by this paper could later be used to aid the various 
stakeholders (e.g., funding agencies and professional 
organizations) in providing opportunities to accel¬ 
erate knowledge generation in the field of CS in the 
country, as well as in strengthening the efficiency and 
effectiveness of existing formal research and technical 
communication channels. 

2. MATERIALS AND METHODS 
2.1 Archive of Scientific Papers 

We have utilized the author information from 326 peer- 
reviewed papers presented during the 2000 to 2010 
PCSC @-[j, |28h32 |. The papers presented each year 
are archived electronically in CDROM format, which is 
distributed to PCSC participants and paper presentors 
during the conference. The CDROM contains papers 
that are usually in portable document format (PDF) 
and comes with a table of contents that is also in 
PDF. An easily parseable hypertext markup language 
(HTML) format of the archive is also accessible from the 
website of the Computing Society of the Philippines [g|. 

Table 0 summarizes the particulars of various PCSC 
such as their respective proceedings, the number of 
papers presented, and the number of authors who wrote 
the papers during each year. The number of papers 
and authors during the 2000 PCSC were closed to 
the annual average, respectively. Both counts increase 
steadily in the earlier 4-year span from 2003 to 2006. 
When the PCSC was held in Boracay in 2007, both the 


number of authors and papers dropped considerably. 
However, both counts gain momentum and increase 
considerably in the recent 4-year span from 2007 to 
2010. The 2010 PCSC has received a record number of 
paper submissions, and thus reflects the record-breaking 
number of papers accepted and presented, as well as the 
number of authors who wrote the papers. The 9-year 
PCSC has attracted an annual average of 36 papers 
and 83 authors. 

In this study, we considered a scientific paper as either 
a keynote paper, a plenary (invited) paper, a tutorial 
paper, or a contributed paper. These paper types are 
present in all PCSC with the exception of the first year 
and the latest two years. In 2000 PCSC, a poster paper 
session (POSTERS) was included and the 2000 archive 
includes these paper type. In PCSC 2009, the research- 
in-progress session (RIPS) was instituted. RIPS allows 
the oral presentation of papers that are usually authored 
by undergraduate students and are categorized by the 
paper review panelists as incomplete or in progress but 
are already worthy of oral presentation. The PCSC 2009 
archive, however, did not label whether the paper was 
RIPS or not. Thus, we assumed here that the 2009 
PCSC archive does not include the RIPS. In PCSC 2010, 
POSTERS was reinstituted. Both RIPS and POSTERS 
papers are included in the 2010 PCSC archive. However, 
we did not include these papers in our study because as 
of this writing, the author information is incomplete for 
papers with more than one author. 

In our analysis of the co-authorship patterns, we con¬ 
sidered an archive V = {Pi, P2 ,..., P N } of N scientific 
papers, with each paper Pi having a list Ai = {Aj\Aj G 
A, \Ai\ = Mi} of Mi authors. From the author infor¬ 
mation in V , we created a database of distinct authors 
A = U£Li -A, such that M = \A\ < Mi. We note 
here that M = YLiLi Mi implies Dili Ai = 0, which 
means that all authors have written exactly one paper. 
Our results show that this is not the case in Philippine 
CS research. 

2.2 Building the Paper-Author Bipartite 
Graph 

Given V and A, we built the paper-author bipartite 
graph PAG = {V\JA,£), where £ = {(i,j)\Pi G 
V,Aj G A}. For each paper Pi, we created a bipartite 
subgraph (sub-bigraph) PAGi composed of a typ e-P 
vertex labeled Pi, and Mi type-A vertices with the 
respective labels as in Ai. We then created edges in 
PAGi by connecting the typ e-P vertex with all the 
Mi type-A vertices. The ith sub-bigraph induced by 
Pi represents the one-to-many relationship between 
the ith paper and its Mi authors. We then connected 
all N sub-bigraphs via each sub-bigraph’s common 
type-A vertices. The resulting graph (J. =1 PAGi is the 
paper-author bipartite graph PAG. Intuitively, PAG 
may be built with a time complexity of 0(N x M) but 
we reduced this to 0(N x log M) by using a balanced 
binary tree structure for A. 

Figure QJa-c) shows how the PAG was created for a 
hypothetical paper archive V composed of two papers Pi 


Table 2: Fundamental statistics of various different scientific collaboration networks: Average number 
of authors per paper (Apavg)> average number of papers per author ( Paavg )• and average number of 
co-authors per author (CUavg)- 


Scientific 

Discipline 


No. of 


Fundamental Statistics 


Literature Reference 


Years A-p avg Paavg CUavg 



International Research 





Biomedical Research 

40 

3.75 

6.40 18.10 Newman [18J 


High-energy Physics 

27 

8.96 

11.60 173.00 Newman [18] 


CS 

10 

2.22 

2.55 3.59 Newman [18] 


Philippine-based Research 





Physics 

5 

3.16 

10.80 Villanueva and Pabico [38] 


Agriculture 

4 

3.81 

1.39 2.70 Pabico [22] 


Various Fields 

4 

3.70 

1.52 2.82 Pabico and Micor [24] 


ABE 

3 

3.02 

1.59 2.35 Pabico [23] 


CS 

9 

2.68 

1.42 3.58 

Table 3: 

Basic information about the 2000 to 2010 Philippine Computing Science Congress: Year and 

site each held, number of papers presented, 

number of authors, and proceedings reference. 

Year 

PCSC Site Number of 

Number of Proceedings Remarks 



Papers 

Authors 

Reference 

2000 

De La Salle University 

35 

78 

Azcarraga [ 6 J POSTERS 

2001 

MSU-IIT 

- 

- 

Data not available 

2002 

- 

- 

- 

Not held 

2003 

Philippine Science HS 

15 

31 

Saldana and Caro [31] 

2004 

UP Los Banos 

29 

61 

Albacea et al. [4] 

2005 

University of Cebu 

33 

80 

Saldana and Chua [32] 

2006 

Ateneo de Manila 

38 

101 

Saldana [28] 

2007 

Boracay Island 

33 

74 

Saldana [29] 

2008 

UP Diliman 

37 

76 

Saldana [30] 

2009 

Silliman University 

41 

93 

Adorna and Saldana [3] RIPS 

2010 

Ateneo de Davao 

61 

148 

Adorna [2] RIPS and POSTERS 


Average 

36 

83 


and P 2 written by authors Ai, A 2 , A 3 , and A 4 . In this 

Figure [T[e-g) shows the flow diagram of the proce- 

scenario, . 

Pi was co-authored by Ai and A 2 , while P 2 

dure on how CAG was created from the hypothetical 

was jointly written by A 2 , A 3 , and A 4 . 

In both papers, 

example mentioned above. The co-authorship sub- 

A 2 was the common author. Separately, the sub-bigraph 

graph induced by Pi is CAGi = ({Ai, A 2 }, {(1, 2 )}), 

induced by Pi is PAGi = {{P 1 ,A 1 ,A 2 }, {(1,1), (1, 2 )}), 

while the co-authorship subgraph induced by P 2 is 

while the 

sub-bigraph induced by . 

P 2 is PAG 

2 — 

CAG 2 = ({A 2 , A 3 , A4}, {( 2 , 3), ( 2 , 4), (3, 4)}). The 

({P 2 , A 2 , As, A,}, {( 2 , 2 ), ( 2 , 3), ( 2 , 4)}) 

The 

sub- 

subgraphs CAGi and CAG 2 are connected through 


bigraphs PAGi and PAG 2 are connected through the 
common vertex A 2 to create PAG = (V[JA, £), 
where V = {Pi,p 2 }, A = {Ai, A 2 , A 3 , A 4 }, and 
£ = {( 1 , 1 ),( 1 , 2 ),( 2 , 2 ),( 2 , 3 ),( 2 , 4 )}. 


the common vertex A 2 to create the co-authorship 
graph CAG(A, £ c ), where A = {Ai, A 2 , A 3 , A 4 } and 
£ c = {(1,2), (2,3), (2,4), (3,4)}. 


2.3 Building the Co-authorship Graph 

We built the co-authorship graph CAG from PAG as 
follows. For each vertex Pi £ PAG, we deleted all ince- 
dent edges to (or from) Pi, as well as Pi itself, and 
created in its instead a complete subgraph CAGi = 
(Ai,£i), where £i = {( j,k)\Aj,A k £ Ai,j ^ A;} and 
\£i\ = Mi(Mi — l )/2 connecting all pairwise combi¬ 
nations of Aj,Ak £ Ai, j k. The fully-connected 
subgraph CAGi represents the co-authorship graph of 
authors who co-wrote the ith paper Pi. The resulting 
graph CAG = Ufli CAGi is the co-authorship graph of 
CS researchers in the Philippines. Because some authors 
have not collaborated, some vertices Ai £ CAG are not 
connected to any of the other vertices Aj £ CAG. 


In building CAG, we adopted the same assumptions 
made by Newman [if]: (1) That all pairs of authors Ai 
and Aj, Vi j, who have written a paper together are 
genuinely acquainted with one another; and (2) That 
the co-authorship graph CAG reflects a genuine profes¬ 
sional interaction between Filipino computer scientists. 

2.4 Computing for node degrees 

From PAG, we can infer an N x M matrix PAM that 
mathematically represents the adjacency of PAG. Each 
matrix element PAMij = 1 if the ith paper is written 
or co-written by the jth author. Otherwise, PAM,;., = 
0. The PAM of the hypothetical PAG discussed above 
is shown in Figure Hid). It is interesting to note that 
PAMij ^ 1 as no distinct author name appears more 
than once in the author line of a paper. 

















Figure 1: The process flow for building the paper—author bipartite graph PAG and the paper—author 
matrix PAM using an archive with two hypothetical papers Pi and P 2 : (a) The hypothetical paper Pi 
and its corresponding sub-bigraph; (b) The hypothetical paper P 2 and the sub-bigraph induced by it; 
(c) The resulting PAG = PAGi IJPAG 2 ; and (d) The equivalent PAM. The process flow for building the 
co-authorship graph CAG and the co-authorship matrix CAM: (e) Deleting vertex Pi and edges (1,1) 
and (1,2), and creating the completely connected subgraph CAGi; (f) Deleting vertex P 2 and edges 
(2,2), (2,3), and (2,4), and creating the fully-connected subgraph CAG 2 ; (g) The resulting CAG = 
CAGi IJCAG 2 ; and (h) The equivalent CAM. The process flow for transforming CAG into CAAG and 
the corresponding mixing matrix CAAM: (i) The mixing network when r = A; and (j) The resulting 
CAAM. In the visualization of the different graphs, square vertices represent papers while circle 
vertices represent authors. 


The degree Af of the ith P-type vertex Pi repre¬ 
sents the number of authors that wrote paper Pi. 
Conversely, the degree Af of the jth A-type vertex 
Aj represents the number of papers that author Aj 
wrote. The vertex degrees A f and A " 4 can be com¬ 
puted using PAM as shown in Equations [T] and [2] 
respectively. We can use the vertex degrees to com¬ 
pute for the minimum (Apmin), average (Apavg) 
and maximum (Apmax) number of authors per paper 
(Equations 0] to 0, as well as the minimum (Pamin), 
average (Paavg) and maximum (Pamax) number of 
papers per author (Equations 0] to 0]) . The degree Af 
of the ith P-type vertex Pi represents the number 
of authors that wrote paper Pi. Conversely, the 
degree Af of the j'th A-type vertex Aj represents 
the number of papers that author Aj wrote. The 
vertex degrees Af and Af can be computed using 
the matrix PAM as shown in Equations 0] and [2] 
respectively. We can use the vertex degrees to com¬ 
pute for the minimum (A-pmin), average (Apavg) 
and maximum (Apmax) number of authors per paper 
(Equations 0] to 0]), as well as the minimum (Pamin), 


average (Paavg) and maximum (Pamax) number of 
papers per author (Equations 0] to [9]). 

In the hypothetical archive discussed above, Af = 
2 while Af — 3. Conversely, A 5 4 = 1, A^ = 2, 
A ^ 4 = 1, and Af = 1. A-pmin = 2 , Apavg = 2.5, and 
Apmax = 3. Similarly, Pamin = 1, Paavg = 1.25, and 
Pamax = 2 . 

From CAG, we can infer an M x M diagonally sym¬ 
metric co-authorship matrix CAM that mathemati¬ 
cally represents ties between the M scientists. Each 
matrix element CAMy*, = CAM^-j = 1 if and only 
if author Aj has collaborated with author Ak with at 
least one paper. Since collaboration is a symmetric 
relation, CAMj^ = 1 implies CAM«,j = 1, which 
means that author Ak collaborates with author Aj in 
response. Without losing generality, we set all diagonal 
elements CAMjj = 0. If Aj has not collaborated 
with A*,, then CAM,^ = CAM*,.,- = 0. Figure [IJli) 
shows the computed CAM of the hypothetical CAG. 
Using CAM, the vertex degree Af of the ith author, 































which reflects the number of co-authors Ai has, is 
computed as shown in Equation O while the min¬ 
imum CUmin, average Caavg, and maximum Camax 
number of co-authors are respectively computed as in 
Equations [TO] to 1121 



M 


Af 

= ^PAM,, 

(1) 


3 = 1 



N 


A- 

<1 

= J2 PAM ^ 

(2) 


i= 1 



M 


A? 

= Y, CAM,-, 

(3) 


3 = 1 


A-pMIN 

N . A -p 

= mm A ? 
i =1 1 

(4) 

A-pAVG 

EtiAf 

N 

(5) 

dpMAX 

N . -p 

= max A, 
i= 1 

(6) 

Pamin 

M 

= min Alp 

l=i 3 

(7) 

Paavg 

T M A- 4 

2-o=i Ai 

M 

(8) 

Pamax 

M .A 

= max A'f 

0=1 3 

(9) 

Gamin 

’f A c 
= mm A , 

i=i 

(10) 

Caavg 

M 

(11) 

Camax 

= max A, 

i=i 

(12) 


2.5 Degree Distributions in pag and cag 

The frequency distribution p(A) of a vertex degree A 
is a graph -based quantity that has been much studied 
and applied recently for various co-authorship graphs 
flU . Hal and social networks am. It provides the fre¬ 
quency that a randomly selected vertex has A edges (or 
degrees). Graphs with high-degree yet low cardinality 
vertices have long-tailed p( A) and are called scale-free 
graphs. Such graphs follow the power law distribution 
(Equation 11311 and oftentimes model the relationships 
of naturally occuring entities, such as that of proteins 
and their interactions [33|. We hypothesized that PAG 
and CAG are scale-free and thus their respective p 
follow a power-law. To test this hypothesis, we fitted 
a power law line each on p(A v ), p(A^), and p(A c ) 
and statistically tested the power to be significantly dif¬ 
ferent from zero at a = 0.05 (where a is taken as the 
probability of the two-tailed alternative greater than the 
test statistics). The power law distribution is statis¬ 
tically estimated by the frequency y in Equation [TT] 
and involves the vertex degree A, a constant c, and 
the power ip, which is also known as the fractal dimen¬ 
sion dj|. We estimated the values of c and by using a 
linear regression analysis in the power law’s linear form 
(Equation 1141) . 


V = cA v (13) 

logy = log c + ip log A (14) 

2.6 Productivity and collaboration 

An author Ai £ A has an inherent vector of valued 
attributes (ti,T 2 ), wherein in this research we set n = 
Pa and T 2 = Ca- we hypothesized that Pa and Ca 
have a high positive correlation such that authors who 
are productive, as measured by their high Pa, are also 
those who have high number of memberships in var¬ 
ious collaboration efforts, as measured by their high 
Ca- High positive correlation would also mean that 
authors who are less productive (i.e., low Pa) are those 
who write alone or their number of collaborators is rel¬ 
atively small (i.e., low Ca)- We tested the hypothesis 
by estimating the Pearson correlation r and statistically 
testing it against zero (i.e., we hypothesize that r ^ 0). 
We utilized the Pearson statistics because the causality 
relation between Pa and Ca was not established (i.e., 
we do not know whether Pa causes Ca, or vice versa, 
or whether such relation exists at all). 

2.7 Assortativity in cag 

Given an attribute r of a vertex, the assortativity r of 
a graph is the tendency of vertices to be connected to 
like vertices [20|, such that there are more edges between 
vertices with high r values than between a liigh-r vertex 
and a low-r vertex. We started its computation by 
relabeling each vertex Ai G CAG (A, £ c ) by its r, and 
converting all undirected edges in £ to bidirectional 
edges to create £d- The resulting graph CAAG(/,£d), 
where A' is just the relabeled vertices in A, and |£d| = 
2 x |£ c |. We used a mixing matrix CAAM, where 
each matrix element CAAMy represents the fraction 
of all edges in CAAG that start at ai and end at a,, 
such that JT . CAAM,- j = 1. Let fi be the fraction 
of all edges in CAAG that are incident to ai, thus 
fi = CAAMi,j. The assortativity r can be approx¬ 
imated by the Pearson correlation coefficient discussed 
by Newman [2G| and subsequently used by Bird et al. Q] • 
Assortativity is when all vertices in CAG are connected 
only to vertices with similar t (i.e., r > 0). Dissorta- 
tivity (or negative assortativity r < 0) is when high-r 
vertices are only connected to low-r ones. Using the 
degree A as r, Figurc[l]i-j) shows how the CAG of the 
hypothetical example discussed above was transformed 
into CAAG, as well as how the CAAM,j was com¬ 
puted. In this paper, we independently used Pa and 
Ca as r to separately discover the general preference 
of CS researchers in choosing a collaborator in terms of 
the collaborator’s Pa and Ca, respectively. 

3. RESULTS AND DISCUSSION 
3.1 The PCSC Paper Archive 

For this study, we utilized the archive V of papers 
presented during the 2000 to 2010 PCSC to infer the 
authorship patterns of Filipino computer scientists. 
The total number of papers presented in these con¬ 
ventions is N = 326, while the number of authors 
is M = 605. As pointed out by Newman [ll|, one 
particular issue that we were concerned about was the 





number of names L that appear in V, which clearly 
identifies distinct authors. This is because L is not 
necessarily the same as M. For example, author A; 
may format his name differently on different papers, 
such that the names Juan dela Cruz, Dela Cruz, Juan, 
and J. dela Cruz could all belong to him. This scenario 
would mean that M = 3, but in fact L — \. On the 
contrary, two distinct authors Ai and Aj may have the 
same name, such that the name Maria Maquiling could 
belong to both. This means that M = 1, while in fact 
L = 2. This apparent name ambiguity problem has 
already been given approximate solutions by various 
techniques [ToL ITU . l3a [39 1] that use additional infor¬ 
mation found in the papers, such as the names of the 
authors’ respective home institutions and their subdis¬ 
ciplines. However, we could not use these additional 
information because there are authors who belong to 
more than one institution, and due to multi-specialty 
research collaborations, they could publish in other sub¬ 
disciplines. Further, the author information in V rarely 
includes the subdisciplines. In order to solve these 
issues, we performed our analysis using the author’s 
surname and first and second names’ initials, knowing 
full well that we may be overestimating the true value 
of M. In this regard, having L > M in this research 
may give us a guarantee that our results provide the 
respective upper bounds of the patterns. 

Figure O shows the annual trend of cumulative number 
of authors and papers presented in the 9-year PCSC. 
Based on simple regression analysis, we found out that 
PCSC has attracted about 60 new authors per year who 
helped co-write about 33 new papers annually. After 
extrapolating these lines to 5 years into the future, we 
can see that in 2015 the number of distinct authors that 
will be contributing to PCSC will reach to 843 while the 
number of papers that will be contributed will reach 
to 458. 
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Figure 2: Annual trend of the cumulative 
number of papers (red square) and authors (blue 
diamond). The red solid and dashed lines, 
respectively, represent the regression and the 5- 
year extrapolation line of the yearly cumulative 
number of papers (slope = 32.99, R 2 = 0.95). The 
blue solid and dashed lines, respectively, repre¬ 
sent the regression and the 5-year extrapolation 
line of the yearly cumulative number of authors 
(slope= 59.54, R 2 = 0.96). (This figure is in color 
in the digital format of this paper.) 


number of authors in CS than in other disciplines. In the 
international co-authorship graphs, more authors are 
needed to write new information in the field of biomed¬ 
ical research (A-p avg = 3.75), and significantly more 
authors in the high-energy physics (Apavg = 8.96). 
However, the Filipino CS research papers needed more 
authors on the average compared to that in the inter¬ 
national CS’s (A-p avg = 2.22). 


3.2 Inferences from PAG 

Table U summarizes the values inferred from PAG. On 
the average, the CS authors in the Philippines have 
writen about 1.42 papers, while papers were written 
by an average of 2.64 authors. The Filipino authors 
have collaborated, on the average, with 3.70 other 
authors. We have shown the comparison of these 
simple statistics with other various national and inter¬ 
national research co-authorship graphs (Table [5]). As 
inferred also from PAG, we have identified the top 
five researchers with the most number of papers in 
the archive: PC Naval (20 papers), RP Salana (19), 
HN Adorna (16), RC Sison (15), and REO Roxas and 
JDL Caro (10 each). We have annotated the vertices 
in Figures [3] and [4] to visualize the respective relative 
positions of these authors in PAG and CAG. 

3.3 Number of authors per paper 

The Filipino CS research papers have been written on 
the average by 2.64 authors, which is lower compared 
to that of the ABE (A-p avg = 3.02), the agricultural 
science (Apavg = 3.81) and NAST sciences (Apavg = 
3.70) in the country. This means that in the Philip¬ 
pines, creating new scientific information requires less 


3.4 Number of papers per author 

On the average, the Filipino CS researchers have written 
less papers ( Paavg = 1.42) than their ABE (Paavg = 

I. 59) and NAST ( Paavg = 1.52) counterparts, but 
more than the agricultural ( Paavg = 1.39) scientists 
in the country. However, the average scientific pro¬ 
ductivity of Filipino computer scientists, measured by 
the number papers written per author, still falls behind 
the international averages. The international biomed¬ 
ical researchers, high-energy physicists, and computer 
scientists have respectively written an average of 6.4, 

II. 6, and 2.55 papers. 

3.5 Inferences from cag 

Figure [4] presents a visualization of the co-authorship 
graph CAG created from the papers in V. In this visu¬ 
alization, it can easily be seen that the graph of CS 
research co-authorship in the Philippines is composed 
of disconnected subgraphs. We have found out that 
authors in each of the subgraphs belong to the same 
institution. This means that CS authors collaborate 
only to authors who belong to the same institution, 
and that cross-institution collaborations do not exist 
yet in the Philippines setting. It is understandable, 





Figure 3: The paper—author bigraph PAG drawn with the graph visualization algorithm by Kamada 
and Kawai [12]. In this visualization, colored circles represent authors while gray squares repre¬ 
sent papers. The labels correspond to some identified authors with the most number of papers: 
(a) PC Naval, (b) HN Adorna, (c) RP Saldana, (d) R Sison, (e) REO Roxas, and (f) D Cheng. In 
addition, (g) is this journal’s editor-in-chief EA Albacea, while (h) is this paper’s author. (This figure 
is in color in the digital format of this paper.) 


Table 4: Values of inferred statistics from PAG and CAG: Minimum, average and maximum number 
of authors per paper ( Av ), number of papers per author (Pa), and number of collaborators per author 
( Ca )j As well as the respective degree distribution’s power law coefficients (ip) and the corresponding 
statistics ( R 2 ). 


Statistics 

Mininum 

Average 

Maximum 

Degree Distribution 
— R 2 

Number of authors per paper 

ApMIN = 1 

Av AVG 

= 2.64 

dpMAX 

= 13 

-2.04 

0.71 

Number of papers per author 

Ramin = 1 

Paavg 

= 1.42 

Ramax 

= 20 

-1.88 

0.83 

Number of collaborators per author 

CUmin = 0 

CUavg 

= 3.70 

Umax 

= 54 

-1.65 

0.80 


however, that not much nationally important computa¬ 
tional problems exist, or have been identified, nowadays 
to bring researchers from several institutions together to 
solve a common problem. We have also identified and 
labeled some central authors in some of the subgraphs. 
We have identified the top five scientists with the most 
number of collaborators namely, PC Naval with 54 col¬ 
laborators, RC Sison with 30, D Cheng with 29, RP Sal¬ 
dana with 25, and HN Adorna with 20. We believed 
that these authors, together with those whom we iden¬ 
tified with the most number of papers, are the central 
scientists in their respective subgraphs. By central we 
mean the most influential person among the connected 
authors in the subgraph. 

3.6 Number of collaborators per author 

In the area of collaborative research, the Filipino 
CS researchers have collaborated with more other 


researchers (CUavg = 3.70) compared to that of their 
ABE (CUavg = 2.35), agriculture (CUavg = 2.70) and 
NAST (CUavg = 2.82) counterparts in the country. 
The Filipino physicists, however, have more collabo¬ 
rators (CUavg = 10.80) than the computer scientists. 
Similarly, the international scientists have collaborated 
significantly more compared to the Filipino computer 
scientists, with the biomedical researchers and high- 
energy physicists having an average collaborators of 
18.1 and 173, respectively. This seemingly high number 
of collaborators in high-energy physics is actually 
achievable, as pointed out by Newman Q3], because 
of the significantly higher average number of authors 
per paper in their community (CUavg = 8.96). Thus, 
the mega-collaboration average of 173 is actually just a 
product of their high Av- The Filipino CS researchers, 
on the other hand, have collaborated with almost the 











Figure 4: The co-authorshop graph CAG of Filipino computer scientists is a sociogram that shows 
the professional relations between scientists involved in scientific research. The sociogram was drawn 
using the two-dimensional force-directed algorithm by Kamada and Kawai [l2j. The labels correspond 
to some identified authors believed to be central persons in their respective subgraphs, (a) P Naval, 
(b) HN Adorna, (c) RP Saldana, (d) R Sison, (e) REO Roxas, and (f) D Cheng are the top researchers 
with the most number of collaborators, (g) This journal’s editor-in-chief EA Albacea with his own 
collaboration subgraph, (h) This paper’s author, who is also a central person in his own, although 
small, subgraph. (This figure is in color in the digital format of this paper.) 


same number of collaborators as that of the interna¬ 
tional counterparts (Caavg = 3.59). 

3.7 Degree Distributions 

Figures [5] shows the respective degree (A v , A a , and 
A c ) frequencies of the vertices in PAG and CAG, each 
plotted in scatter (for raw data) and line (predicted) 
plots. Figure [5j a) shows the scatter and predicted line 
plots of the frequency distribution of the number of 
authors per paper in normal and log-log scales. Here 
we see that the predicted line plots follow a power law 
form. The power law line that we we found has the form 
y = 269.^(A 73 ) -2 ' 04 with R 2 = 0.71. Both coefficients 
c = 269.15 and p = —2.04 are significantly different 
from zero at 1% statistics, respectively, confirming our 
hypothesis that p(A v ) obeys a power law distribution. 
We did not include the distribution for A a = 0 because 
no paper could have been written by zero authors (i.e., 
no paper has a missing author information). 

Figurc[5|b) shows the scatter and predicted line plots of 
the frequency distribution of the number of papers per 
author. Here we see that the line plot follows a power 
law of the form y = 138(A' A ) -1 ’ 88 with R 2 = 0.83. 
The coefficients c = 138 and p = —1.88 are signifi¬ 
cantly different from zero at 1% statistics. Here, we 
did not include the distribution for A v = 0 because 
the CS researchers who have written zero papers are 


not included in the archive. The distribution gener¬ 
ally characterizes a high number of authors who wrote a 
small number of papers, and a small number of authors 
who wrote a very large number of papers. Thus, in 
CS research the Philippines, the number of highly pro¬ 
ductive researchers is a relatively small fraction of all 
Filipino CS scientists. The power p = —1.88 is in 
close agreement with Lotka’s law of scientific produc¬ 
tivity found in an experiment in 1926 to be ~ —2 
while the coefficient c = 138 uniquely characterizes the 
scientific productivity of CS researchers in the Philip¬ 
pines. 

Figure Oc) shows the degree frequency of the vertices 
in CAG with a power law line fit of the form y = 
251.03(A C ) -1 ' 65 having a R 2 = 0.80. We accept that 
the power law is the best model for p{A c ) because we 
have found both c = 251.033 and p = —1.65 to be sig¬ 
nificantly different from zero at 1% statistics. A power 
law fit suggests that: 

1. Only a few number of authors have the most 
number of co-authors in CAG. These authors act 
as information hubs in the co-authorship graph, 
and therefore has the potential control of informa¬ 
tion flow through the network. We deemed such 
authors as influential or central. We have already 
identified these central persons in § 13.21 and § 13.51 
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Figure 5: The vertex degree distributions follow the power law. Blue squares mean frequency of the 
vertex degree while the red dashed line is the power law fit. INSET: The same scatter and line plots 
in log-log scale, (a) p{A r ) : y = 269.15* (A 73 ) -2 ' 04 ,R 2 = 0.71; (b) p(A a ) : y = 138* (A -4 ) -1 ’ 88 *, 7? 2 = 0.83; and 
(c) p(A c ) : y = 251.03* (A 0 )^ 1 ' 65 , R 2 = 0.80. *The estimated coefficients are significantly different from 
zero at a = 0.01 statistics. (This figure is in color in the digital format of this paper.) 


2. The co-authorship of CS research in the country 
is scale invariant. This means that the proper¬ 
ties of CAG that we observed in this study, as 
well as the patterns of co-authorship and publica¬ 
tion, will not change as much when the number of 
authors M increases. This makes CAG a scale- 
free graph. 


3.8 Correlation Between P A and c A 

Figure [SJ a) shows the scatter plot between P A and C A . 
The scatter plot shows that they are positively corre¬ 
lated with r = 0.7425. This suggests that the scien¬ 
tific productivity of the country’s CS researchers, as 
measured by their number of papers, is correlated with 
the researchers’ participation in collaborative research 
efforts, as measured by their number of co-authors. A 
highly productive scientist is most likely to have a high 
number of collaborators, and vice versa. This observa¬ 
tion is particularly true in scientific publications because 
a large group of scientists has more manpower available 
for writing papers. 


3.9 Assortative mixing in cag 

Figure [B^b-c) shows the mixing plots for correlating the 
P A and the C A of each researcher. These correlations 
quantify how a computer scientist chooses his collabo¬ 
rator based on the similarity or dissimilarity of his and 


the collaborator’s attributes. Based on the Pearson cor¬ 
relation analysis, a computer scientist chooses a col¬ 
laborator who has a dissimilar P A (—0.1015) or C A 
(r = —0.0398) as he has. We expect that a computer 
scientist with a low P A will most likely chooses a col¬ 
laborator whose P A is high. 

4. SUMMARY AND CONCLUSION 

In this paper, we inferred two graphs PAG and CAG 
from the author information of CS papers in the country 
using various computational techniques. The graphs 
were based on publication data in various PCSC with 
326 papers written by 605 authors. A large number of 
calculations were performed on the graphs, including 
the fundamental averages A-pavg = 2.64, P A avg = 
1.42, and CUavg = 3.70. The respective frequency dis¬ 
tributions of these quantities follow a power law which 
suggests that most papers were written by scientists 
with a small number of collaborators, while few papers 
were authored by those with large number of collabo¬ 
rators. Specifically, the power ip = —1.88 of the fre¬ 
quency distribution for P A closely agrees with Lotka’s 
law of scientific productivity. The productivity of the 
scientists, as measured by P A , is positively correlated 
with the scientist’s participation in a number of collab¬ 
orative research efforts, as measured by C A , suggesting 
that highly productive scientists are more likely to have 
a high number of collaborators, and scientists with high 
number of collaborators are more likely to be highly 
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Figure 6: (a) Scatter plot between the number of papers and the number of collaborators of each 
author shows a positive correlation of r = 0.7425; (b) Mixing plot between the number of papers per 
author shows a low negative correlation of r = —0.1015; and (c) Mixing plot between the number of 
co-authors per author shows a low negative correlation of r = —0.0398. 


productive. The assortativity tests show that scien¬ 
tists prefer to conduct collaborative research endeavors 
with scientists whose number of papers and collabora¬ 
tors are different from theirs. It is therefore reason¬ 
able to suppose that the scientific enterprise in the CS 
field in the Philippines will be significantly be given a 
boost if collaboration among scientists will be promoted 
(e.g., maybe through governmental policies and other 
programs). 

The following efforts are already underway as extensions 
to this research endeavor: 

1. Time study to measure the dynamics and evolu¬ 
tion of CAG. The current effort did not mea¬ 
sure how the current CAG has evolved to what 
it is today. Thus, the extended study tests sev¬ 
eral hypotheses regarding the nature of the devel¬ 
opment of the CAG, including the social phe¬ 
nomenon called preferential attachment. Preferen¬ 
tial attachment, also known as the aAIJrich gets 
richeraAI adage, is the tendency of new scientists 
to build collaborations with prolific scientists, and 
then later on seek more collaborations with other 
prolific ones. These tendencies make scientists 
with high number of papers to write more papers 
in a given time than others. 


2. Development of a National Researcher Database 
System. Due to the inherent name ambiguity 
encountered in the conduct of this research, 
it is recommended that a National Researcher 
Database System (NRDS) be developed. NRDS 
will keep track of the changes in names used by 
a researcher, and at the same time be a reposi¬ 
tory of scientific articles in the Philippines. The 
content of the repository may be used as the 
National Index of Scientific papers in the Philip¬ 
pines. With the NRDS, a citation network may 
also be inferred to compute the impact factor, not 
only of the journals and proceedings, but also of 
the papers themselves. 
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